home *** CD-ROM | disk | FTP | other *** search
-
- Extracting Personal Names
-
- This menu selection is new to this version of PC─INDEX. Extract
- Personal Names will go through a document finding personal names,
- first and last names and writing them out to a phrase file. This
- file can then be used to create a name index or merged with
- another phrase file to create a more comprehensive index that
- includes names.
-
- This selection is not guaranteed to find all names in a document,
- but it is a good starting point. Usually this option will extract
- capitalized words that are not really names rather than omit
- names.
-
- In order to use this option correctly, it will be helpful to
- understand what is happening. PC─INDEX scans a document until it
- finds at least two capitalized words in a row. If two
- capitalized words are found, then the first word is looked up in
- the Personal Name File. If the name is found then this sequence
- of capitalized words is assumed to be a person's name.
-
- The Personal Name File contains over 12,000 first names. You may
- want to browse through the list using the Edit Personal Name File
- (found in the Edit List Menu) to make sure that it contains names
- you know you need.
-
- When you select Extract Personal Names, you will see a screen
- asking you for an Input File Name, an Output File Name, the
- Maximum Number of Words in a Name, and information regarding the
- surname (last name).
-
- For the input file name enter the name of the document you want
- to extract names from. For the output file name enter any name
- you want. It is recommended that you use a file name with the
- extension '.dbf'.
-
- The maximum number of words in a name can be any number from 2 to
- 6. There must be at least 2 words in a name (a first and last
- name) and no more than 6. In any case, the total number of
- characters in a name must be 70 or less. For this example enter
- 3 for the Maximum Number of Words in a Name.
-
- The last three choices tell PC─INDEX how last names can be
- recognized. These choices were added to help PC─INDEX to find
- names faster and more accurately.
-
- The fastest and most accurate method for extracting names is Last
- Name contains ALL CAPS. In order to use this option, all
- surnames must contain all capital letters and names that are not
- surnames cannot contain all caps. If it isn't possible to use
- all caps in last names then use one of the other options. If it
- doesn't matter to you whether last names are all caps or not,
- then it is recommended that you use all caps. The increase in
- speed and accuracy will be significant.
-
- The next option, Last Name is not ALL CAPS tells PC─INDEX that no
- names will contain only capital letters. This is the second
- fastest and second most accurate method for extracting names.
-
- The last option, Last Name may or may not be ALL CAPS should be
- selected if the way capital letters used in names is not
- consistent.
-
- For this example select Last Name contains ALL CAPS.
-
- The completed screen should look something like this:
-
- ┌───────────────────────────────────────────────────────┐
- │ Input File Name: (Name of Document to process) │
- │ pci.doc │
- │ │
- │ Output File Name: │
- │ pcinames.dbf │
- │ │
- │ Maximum Number of Words in a Name (2 ─ 6) │
- │ 3 │
- │ │
- │ X Last Name is ALL CAPS │
- │ │
- │ Last Name is not ALL CAPS │
- │ │
- │ Last Name may or may not be ALL CAPS │
- └───────────────────────────────────────────────────────┘
-
- When you have finished entering the filenames and other
- information, press F10 to begin processing.
-
- You should see a status box which tells you the number of words
- to be processed, the number of words actually processed, the
- number of names found, percentage completed, and the elapsed
- time.
-
- After this is complete, browse through the names that were just
- extracted by selecting Edit Extracted Name File from the Edit
- List Menu. This will allow you to correct names if necessary, to
- delete entries completely, or to manually add names to the list.
-
- If you are following the entries in this example, the Extracted
- Name File should look like this:
-
- ┌───────────────────────────────────────────────────────────────┐
- │ ┌──────────────────── Edit Phrase List ────────────────┐ │
- │ │ │ │
- │ │ BENSON │ │
- │ │ BENSON │ │
- │ │ BENSON │ │
- │ │ BENSON │ │
- │ │ WILLIAMS │ │
- │ └────────────────────────────────────────────────────────┘ │
- │ │
- │ ┌────────────── Display Complete Phrase ───────────────┐ │
- │ │ BENSON │ │
- │ │ Brian │ │
- │ │ Brian BENSON │ │
- │ └────────────────────────────────────────────────────────┘ │
- └───────────────────────────────────────────────────────────────┘
-
- You may want to merge the extracted name file with a phrase file
- so an index will contain both names and phrases. Since the
- extracted name file is actually a phrase file, you can use Merge
- Phrase Files (found in the Merge Files Menu) to accomplish this.
-
- You may notice that one entry lists the name Brian Brian BENSON.
- This is not really a mistake. If you look at page13 (as well as
- the example above) you will see that the name Brian appears twice
- before BENSON. PC─INDEX makes no attempt to find possible
- mistakes, it only finds sequences of names. This is one example
- why you need to edit the extracted name list before you create an
- index.
-
- If you want to merge a name file with a phrase file use
- pcinames.dbf as the Input Merge File Name and phrase.dbf as the
- Output Merge File Name. After performing this step, all
- extracted names will be in the standard phrase file.
-
- If you only have a few names in your document, you may want to
- consider adding them manually to your phrase file.
-
-
-
-
-